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Abstract 

Pummelo cultivars are usually difficult to identify morphologically, especially when fruits are unavailable. The problem was 
addressed in this study with the use of two methods: high resolution melting analysis of SNPs and sequencing of DNA 
segments. In the first method, a set of 25 SNPs with high polymorphic information content were selected from SNPs 
predicted by analyzing ESTs and sequenced DNA segments. High resolution melting analysis was then used to genotype 
260 accessions including 55 from Myanmar, and 1 78 different genotypes were thus identified. A total of 99 cultivars were 
assigned to 86 different genotypes since the known somatic mutants were identical to their original genotypes at the 
analyzed SNP loci. The Myanmar samples were genotypically different from each other and from all other samples, 
indicating they were derived from sexual propagation. Statistical analysis showed that the set of SNPs was powerful enough 
for identifying at least 1000 pummelo genotypes, though the discrimination power varied in different pummelo groups and 
populations. In the second method, 12 genomic DNA segments of 24 representative pummelo accessions were sequenced. 
Analysis of the sequences revealed the existence of a high haplotype polymorphism in pummelo, and statistical analysis 
showed that the segments could be used as genetic barcodes that should be informative enough to allow reliable 
identification of 1 200 pummelo cultivars. The high level of haplotype diversity and an apparent population structure shown 
by DNA segments and by SNP genotypes, respectively, were discussed in relation to the origin and domestication of the 
pummelo species. 



Citation: Wu B, Zhong G-y, Yue J-q, Yang R-t, Li C, et al. (2014) Identification of Pummelo Cultivars by Using a Panel of 25 Selected SNPs and 12 DNA 
Segments. PLoS ONE 9(4): e94506. doi:10.1371/journa!.pone.0094506 

Editor: Shu-Biao Wu, University of New England, Australia 

Received September 23, 2013; Accepted March 17, 2014; Publislied April 14, 2014 

Copyright: © 2014 Wu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted 
use, distribution, and reproduction in any medium, provided the original author and source are credited. 

Funding: This work was Supported by the International Science & Technology Cooperation Program of China (Grant No: 2012DFA30610), the National Program 
on Key Basic Research Project (Grant No: 201 ICBl 00600), and the National Natural Science Foundation of China (Grant No: 30971992). The funders had no role in 
study design, data collection and analysis, decision to publish, or preparation of the manuscript. 

Competing interests: The authors have declared that no competing interests exist. 

* E-mail: gy_zhong@163.com 



introduction 

Pummelo [Citrus maxima (J. Burman) Merrill] is an important 
cash crop that is widely cultivated and consumed in the world, and 
particularly in China and other Southeast Asian countries. 
Pummelo cultivars are easy to be mis-identified for the following 
reasons: first, there are hundreds of pummelo cultivars, and the 
origin and morphological traits of many local cultivars have been 
poorly documented and hence largely unknown to the outside 
world; second, pummelo cultivars have been traditionally distin- 
guished by a few morphological traits, mostiy fruit traits, and trees 
without fruit are often difficult to distinguish; third, unlike many 
other poly-embryonic species in Citrus, true pummelos are mono- 
embryonic and could have been both sexually and asexuaUy 
propagated during domestication, adding more difficulty to the 
cultivar identification problem. A reasonable solution to the 
pummelo cultivar identification problem is to use DNA molecular 
data to help identify the cultivars. 

Cultivated pummelos have been reported to be highly diverse in 
Southeast Asia, China and India by different researchers based on 
phenotypes of their fruits [1,2,3,4]. Barkley et al. [5] analyzed 370 
citrus accessions including 89 pummelo individuals on 24 SSRs 



and identified a heterozygosity of 0.4238 for the pummelo 
accessions. Traditionally, pummelo cultivars in China were 
classified into three groups, i.e., the Wendan group, the 
Shatianyou group and the hybrid group (interspecific). A study 
based on SSR and AFLP markers identified a high genetic 
diversity existing in the 1 1 0 analysed pummelo cultivars collected 
mainly from China, and showed that the Shatianyou group 
members were closely related to each other whereas members in 
Wendan group were more diverse [6] . In the study of OUitrault et 
al. [7] using Clementine [Citrus Clementina Blanco) EST-SNPs, 10 
pummelos were assigned to 8 different genotypes. These studies 
suggested that it should be possible to identify pummelo cultivars 
using molecular data. 

SNPs are widely used for identity analysis in recent years 
[8,9,10]. SNPs were developed for variety identification in melon 
(Cucumis melo L.) [9], cereal [1 1] and capsicum (Capsicum annuum L.) 
[12]. Using SNPs in tree cultivar identification has also been 
reported. It was claimed that several thousands of grapevine ( Vitis 
vinifera L.) cultivars could be distinguished with the use of a set of 
48 SNPs [13]. Cultivar identification using SNPs was also used in 
olive [Oka europaea L.) [14] and Eucalyptus [15]. Another popular 
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DNA marker in individual or cultivar identification is SSR 
[16,17,18]. Generally speaking, a SSR contains more polymorphic 
information content (PIC) than does a SNP, as SSRs are often 
multi-allelic while SNPs are mostly bi-allelic [19,20]. A compar- 
ative study showed that a set of 23 selected SNPs was similar in 
discrimination power to a set of 1 3 SSRs for soybean [Glycine max 
(L.) Merr.) cultivar identification [8] . However, SNPs have severed 
advantages over SSRs. First, the nomenclature of SNP is much 
simpler than that of SSR, which makes analysis and sharing of 
results much easier [21,13]. Second, SNPs greatly outnumber 
SSRs in genomes, and the study of Tokarska ct al. [20] showed 
that in case of difficult to obtain enough polymorphic SSRs due to 
low level of genetic diversity it was still possible to find enough 
number of SNPs for identity analysis. Third, SNPs are genetically 
more stable than SSRs, since SSR loci are prone to unequal 
recombination during meiosis or slippage during DNA replication, 
and new alleles or homoplasy can be generated [22,23]. And SNP 
genotyping results have been reported to be highly consistent 
among different laboratories using different genotyping techniques 
[24]. 

High resolution melting analysis (HRMA) has been proven to be 
an efficient, moderately high throughput, and highly accurate 
genotyping method [25,26]. HRMA allows tens to hundreds of 
samples to be analyzed in a single plate in 30 minutes or so 
depending on the type of machine used [27]. And these 
advantages make it quite suitable for quick identification of 
cultivars. 

DNA segments represent another valuable method for genetic 
variability studies. Segments from mitochondrial, chloroplast or 
nuclear genomes have been utilized for species identification in 
DNA barcoding technology [28,29,30]. However, the segments 
used in barcoding are intentionally selected for their possessing 
enough inter-specific rather than intra-specific polymorphism and 
are therefore not so suitable for individual identification within a 
species. But in theory, DNA segments could still be applicable for 
identifying individuals or cultivars if enough independent 
polymorphic segments were used. In this respect, chloroplast 
and mitochondrial segments are less effective than nuclear DNA 
segments since they are inherited asexually [31]. In contrast, a 
polymorphic nuclear locus containing a number of alleles 
(haplotypes) could generate a larger number of genotypes in a 
diploid organism. Thus, combinations between different such loci 
would be numerous enough to allow any cultivar to be identified. 

In this study, we tried to use SNPs and a limited number of 
DNA segments to achieve a quick and reliable identification of 
pummelos. 

Materials and Methods 

Plant materials 

All the plant materials were acquired with permissions from 
their owners or preservers abiding by the laws in China and in 
Myanmar. Pummelo leaves collected from Myanmar were private 
possessions and the owners (Kareng ma kam, Changhai Leng, 
Kachin State, Myanmar) agreed with the use of the materials in 
research. The plant materials used in this study did not involve 
endangered or protected species. 

Leaves of 205 citrus trees were collected from the National 
Citrus Germplasm Repository (ChongT|ing), and the Citrus 
Germplasm collection block (Guangdong) (Table SI). The samples 
included 99 pummelo cultivars, 26 unknown accessions, 24 
hybrids between pummelo and other citrus species (referred to 
as CUL, UNKNOWN, MYANMAR and HYBRID respectively), 
a Honghe papeda [Citrus hongheensis Y. M. Ye, X. D. Liu, S. Ding, 



et M. Q. Liang) and a Honghe papeda hybrid. Two and three 
individuals were sampled for 30 and 12 ctiltivars respectively, 
which were marked in Table SI. Leaves from 55 pummelo trees 
were collected at the border between Myanmar and Yunnan 
province of China, of which 19 trees were located around 
N23°52.275'E97°41.285'and the rest 36 trees were located 
around N24°3.173'E97°35.349'. For convenience, aU accessions 
and individuals will be collectively referred to as accessions. 
Genomic DNA was extracted from leaves using EasyPure Plant 
Genomic DNA Extraction Kit (TransGen Biotech, Beijing, China) 
using the protocol supplied with the kit. 

Identification and selection of pummelo SNPs 

Two strategies were used to obtain pummelo SNPs. First, 
pummelo haplotypes were inferred from the EST sequences of 
sweet orange [Citrus sinensis (L.) Osbeck], sour orange [Citrus 
aurantium L.) and grapefruit [Citrus paradisi Macfayden) that are 
known to be derived from crosses between pummelo and other 
citrus species [32,33,34,35], and the homologous haplotypes were 
compared pairwise to identify the pummelo intra-specific SNPs 
[36]. Sei:ond, direct sequencing of pummelo genomic segments 
was used to obtain more SNPs. Briefly, the genomic segments of 
Zeaxanthin epoxidase [ZEP), Phytoene synthase [PST), Phytoene desaturase 
[PDSj axidfi -carotene hrdroxykse [CHX] genes that were 1604, 837, 
764 and 2200 bp long respectively were cloned by PGR using the 
respective gene-specific primers from Guanximiyou and Lingnan- 
shatianyou [36]. The PGR amplicons were then cloned using 
pEASY-Tl Simple Cloning Kit (TransGen Biotech, Beijing 
China). Two or more positive clones were sequenced for each 
gene segment from both cultivars using Sanger method. The 
synthesis of oligonucleotide primers and Sanger sequencing were 
carried out by Beijing Genomics Institute (BGI, Shenzhen, China). 
Sequences were aligned using Clustal X version 2.0 [37] and 
putative SNPs were identified by that the two SNP alleles should 
be represented by at least two different clones respc'cti\-ely. 

A total of 60 putative SNPs were thus identffied and used to 
select for a set of informative SNPs suitable for pummelo cultivar 
identification. First, each SNP was experimentally analyzed to 
exclude those that were unable to be genotyped by HRMA 
method. Second,allelic frequencies were estimated for each of the 
remaining SNPs by genotyping 24 randomly selected Chinese 
pummelo accessions (excluding somatic mutations), and the SNPs 
with minor allelic frequency >10% were retained as candidates. 
Third, all candidate SNPs were mapped onto the sweet orange 
reference genome [38] to estimate the physical distance and the 
linkage relationship between any two SNPs, and those without 
significant linkage to others were preferred. However, some 
physically linked SNPs were both retained for their high PIC 
values, and treated as super loci (described in Genotype analysis). 
Finally, a set of 25 SNPs were selected. 

SNP genotyping 

High resolution melting analysis of small amplicons was used in 
SNP genotyping by following the protocol described by Gundry et 
al. [25]. LCGreen PlusH- Melting Dye (BioFire Diagnostics, Salt 
Lake, USA) was used, and HRMA was performed on 96-Well 
LightScanner System (BioFire Diagnostics, Salt Lake, USA). The 
fluorescence signal was recorded from 55°C to 90°C for all SNPs. 
The 3' ends of the primers were designed to be as close as possible 
to the SNP loci so that the amplicons would be as short as possible 
[39,40] . The primer sequences and lengths of the amplicons were 
listed in Table 1. 

Genotyping by direct sequencing of PCR amplicons was used to 
a) verify the HRMA results, b) identify possible primer-template 
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mismatches encountered in HRMA analysis, and c) genotype the 
gene segments. PGR primers were designed so that the amplicons 
would be around 500 bp long with the known SNP in the middle 
to ensure that high quality sequencing data surrounding the SNP 
would be obtained by Sanger sequencing method that was used in 
this study. For identifying primer-template mismatches, primers 
were compared with their complementary sequences of the 
templates. 

For the 12 sequenced gene segments, SNPs were detected and 
all genotypes were read out from sequencing data by using Variant 
Reporter Software vl.l (Applied Biosystems, Foster City, CA), and 
the genotypes were visually examined by reading sequence 
chromatograms. 

Genotype analysis 

The total number of genotypes and the common genotypes 
shared by two or more cultivars were analyzed using Dropout 
[41]. The distribution of the minimum pairwise differences 
between a genotype and other genotypes was .shown by Dropout. 
AU pummelo accession.s/individuals were grouped, either by 
geological origins where the samples were collected (MYANMAR 
group) or by traditional classification (SFLATIANYOU group and 
WEND AN group) (Table SI). The fixation coefficient was 
calculated for each group using GenAlEx [42]. The relationship 
showing that the increase in genotype numbers with the adding of 
SNPs was displayed by GenAlEx. To show population structure, a 
Bayesian clustering method was applied to the whole dataset to 
assign genotypes to inferred clusters using STRUCTURE version 
2.3.4 [43]. Ten independent runs of K = 1 to 8 each were 
performed at 1,000,000 Markov Chain Monte Carlo (MCMC) 
repetitions with a 100,000 burn-in period using no prior 
information and assuming correlated allele frequencies and 
admixture. The In likelihood of the posterior probability K [P 
(K I X)] was used to choose the most likely value for K. 

The statistical power in identity analysis was calculated for the 
set of 25 SNPs used in the total sample and in different groups or 
populations respectively. The level (r^) of linkage disequilibrium 
(LD) between SNPs was analyzed using PowerMarker V3.25 [44]. 
SNPs in si gnificant LD (r^>0.1, p<0.05) were combined as a 
super locus, and haplotypes and haplotype frequencies were 
inferred using PowerMarker V3.25. The probability of identity 
between two random individuals (PI), between two random 
sampled siblings (PIsibs), and between a random selected pair of 
a parent and an offspring (PIpar-off) were calculated using 
GenAlEx [42,45,46,47]. The PI, PIsibs and PIpar-off for each of 
the 178 gen()typ(-s identified from the total sample were also 
calculated using Dropout [41]. 

For the 12 gene segments, segregating sites (S), nucleotide 
diversity (ti) and haplotype diversity (Hd) were obtained by using 
DnaSP V. 5.10.01 [48]. Haplotypes were reconstructed from 
genotype data using DnaSP v. 5.10.01. PI and PIsib for the 12 
gene segments were calculated using GenAlEx [42]. 

The number of cultivars (n) which could be reliably discrim- 
inated by the selected SNPs or DNA segments was calculated 
using the following formula: 

(l_PI)''^("-i)/2>95% 

where PI is the total PI of the selected SNPs or DNA segments. 
This formula assures that in a sample of n cultivars, the chance for 

the non-existence of any false identical genotypes (two different 
genotypes were identified to be the same using the marker set) is 
larger than 95%. 



Construction of Neighbour-joining (NJ) tree 

The haplotypes inferred from the 12 gene segments were used 
to construct Neighbour-joining tree together with the correspond- 
ing sequences of the haploid sweet orange genome [38], the 
haploid Clementine genome [49] and Nanju {Citrus reticulata Blanco 
var. Nanju) whole genome resequencing data (our unpublished 
data) using PAUP 4.0 [50]. The evolutionary distances were 
computed using the Tajima-Nei method [51] and trees were 
displayed in FigTree VI. 40 [52]. 

Results 

Identification of pummelo SNPs and genotyping of 
pummelo accessions by HRMA 

A total of 60 putative pummelo SNPs were obtained using the 
two strategies described in materials and methods, and out of 
them, a set of 25 SNPs (hereafter referred as Setl SNPs) were 
selected and used to identify pummelo cultivars (see in materials 
and methods). Mapping on the sweet orange reference genome 
showed that these SNPs are interspersed on 8 of the 9 citrus 
(2n= 18) chromosomes (Table 1) except two that were located on 
scaffolds that have not been assigned to any chromosome. 

The genotypes of the pummelo accessions were revealed by 
using high resolution melting analysis of amplicons. On all the 25 
Setl SNPs, heterozygotes were clearly distinguishable from 
homozygotes by shapes of their respective melting curves and 
derivative melting curves. The Tm difference between the two 
different homozygote amplicons of each SNP was between 0.6 and 
2.0°C, and accordingly, the amplicons' melting and first derivative 
curves were shifted away from each other along the horizontal axis 
(Figure 1). 

HRMA of small amphcons was supposed to be less powerful in 
distinguishing different homozygotes of A/T or G/C SNPs since 
there is no difference in hydrogen-bonds between the two 
homozygotes. However, the SNP chrl25991020_A/T which was 
an A/T SNP showed ~ 1 °C of difference in Tm values between 
the two different homozygotes, making them clearly distinguish- 
able (Figure 1). Currently, we have no explanation for this. 

To our surprise, two or more different heterozygous melting 
curves were observed on 3 SNPs during our initial SNP screening 
work as shown by SNP chr4_18094735A/G in Figure 1. It was 
suspected that there existed extra mutation(s) in primer-template 
complementary regions, which made the PGR amplification favor 
the template that paired perfecdy with the primers because any 
mismatch between template and primer would reduce the chance 
of the primer annealing to the template. Apparently, the majority 
of the amplicons should be from the template pairing perfectiy 
with the primers rather than from that pairing imperfectly with the 
primers, and the shape of the resulting melting curve should thus 
be distorted so as to approach to the typical curve of the amphcons 
from the homozygous template without primer-template mis- 
matches (Figure SI). To verify, the flanking sequences of the 3 
SNPs were cloned and sequenced, and new SNPs were indeed 
found on their primer-template pairing regions (Figure SI). The 
three SNPs were therefore excluded from further use. 

To find how reliable our HRMA genotyping results were, 36 
culti\ars were repeatedly analysed (two to three times) at all Setl 
SNP loci, and a total of 6 mismatches were identified. Detailed 
analysis indicated that these mismatches were resulted from 
experimental errors, i.e., sample loading errors, which were 
identifiabk; and correctable by either repeating HRMA analysis or 
inspecting manually the melting curves. In addition, genotyping by 
sequencing was also done for 1~25 accessions at Setl SNP loci to 
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Figure 1. Representative high resolution melting curves for pummelo SNPs. The left and the right panels were melting curves and 
derivative melting curves respectively. Red and green curves were homozygous amplicons with low and high melting temperatures, respectively. 
Gray and blue curves represent heterozygotes. Top panels: melting curves of an A/G SNP (chr1_20043485A/G). IVliddle panels: melting curves of an A/ 
T SNP (chr1_25991020A/T). Bottom panels: melting curves of an A/G SNP (chr4_18094735A/G); note the two different heterozygotes with different 
melting curves, the normal heterozygote (referred as Hel) as shown in gray and the abnormal heterozygote (referred to as He2 that probably 
contained a primer-template mismatch) in blue. 
doi:10.1371/journal.pone.0094506.g001 



verify the HRMA data. In the end, it was found the results from 
both sequencing and HRMA were fairly consistent. 

Genotypes of the 260 pummelo accessions 

Genotyijing of the 260 acces.sions were conducted on all Setl 
SNPs (Table SI). As a result, a total of 178 dififerent genotypes 
were identified (Table SI). It was shown that the minimum 
difference between a genotype and any of the other genotypes 
reached an average of 6 SNPs and the smallest was 2 SNPs 
(Figure 2). Most of the samples with the same genotypes were 
found to be individuals of the same cultivars or cultivars derived 
from somatic mutations. Synonyms were identified, including 
Gaopoyou/Bianboyou, Guanxiangyou/Pengxiyou, and Jin- 
shayou/Baiyushuang. CUL were assigned to 86 different geno- 
types. For 6 cultivars, the samples collected from Chongqing did 
not match those collected from Guangzhou. Eleven of the 26 
UNKNOWN accessions matched known cultivars, and the 
remaining 15 accessions were assigned to 15 new genotypes. 
The 55 MYANMAR individuals were assigned to 55 unique 
genotypes. The 24 HYBRID accessions were assigned to 20 
dilferent genotypes since red Hassaku and Hassaku were identified 
to be the same genotype, and so were Beni Amanatsu and Kawano 
Natsudaidai, Red Marsh, Star Ruby and Flame grapefruits. 
Honghe papeda and its hybrid were assigned to 2 unique 
genotypes. It was shown in Figure 3 that the number of genotypes 
identified increased with increased use of SNPs and 1 3 SNPs were 
already enough to distinguish all 178 genotypes. Only 11 and 8 



SNPs were needed to identify all the CUL genotypes and the 
MYANMAR individuals, respectively. 

From this study, a DNA fingerprint database was established for 
the analyzed pummelos (Table SI), and can be used as references 
when a cultivar needs to be typed. 

Population structure analysis and Statistical Power of 
SNPs in cultivar identification 

The 25 SNPs have an average PIC value of 0.271 in the total 
sample (Table 1). The same set of SNPs could have different power 
in identity analysis in different populations, since populations are 
often different in allelic frequencies. According to tradition, the 
101 CUL+UNKNOWN genotypes (86 cultivar genotypes +15 
unknown genotypes) were classified into SHATIANYOU group (7 
genotypes), WENDAN group (25 genotypes) and UNASSIGNED 
group (69 genotypes). To detect if there truly was population 
structure in pummelos the genotype data were subjected to 
population genetics analysis. Test against Hardy-Weinberg 
equilibrium showed that the null hypothesis was rejected (p< 
0.05) on 9 and 3 of the 25 Setl SNPs in CUL+UNKNOWN 
group and in MYANMAR group, respectively, suggesting the 
accessions in the two arbitrary assigned groups could be in fact 
from different populations. In addition, the fixation coefficients of 
the SHATL\NYOU group and the WENDAN group were 0.02 
and 0.10, respectively, suggesting also that there could exist 
unresolved population structure in WENDAN group. 
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Minimum number of SNPs differed from other genotypes 

Figure 2. Distribution of the minimum number of SNPs differed between a genotype and other genotypes. 

dol:10.1371/journal.pone.0094506.g002 
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Figure 3. The relationship between observed genotypes and numbers of used SNPs. ALL, all the 178 genotypes; MYANMAR, the 55 
Myanmar genotypes; CUL, the 86 cultivar genotypes. The 25 SNPs were ranked, from large to small, by their PIC values in ALL, MYANMAR and CUL, 
respectively, and then added one by one into analysis. 
doi:1 0.1 371/journal.pone.0094506.g003 
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By using Bayesian clustering program STRUCTURE version 
2.3.4 with K= 1~8, the average In likelihood values peaked at 
K = 4 for either die CUL+UNKNOWN or die CUL+UN- 
KNOWN+MYANMAR, indicating most likely there were 4 
populations in our analyzed samples. The 4 supposed populations 
of the CUL+UNKNOWN+MYANMAR had a population 
differentiation value of 0.16 (Fst), and were designated as PI, 
P2, P3 and P4. And 29, 46, 38 and 43 of the 156 CUL+ 
UNKNOWN+MYANMAR genotypes were clustered into PI, P2, 
P3 and P4, respectively (Figure S2). 

Significant LD (r^>0.1, p<0.05) was discovered in three groups 
of SNPs. The first group contained two SNPs, chr2_30594899T/ 
C and chr2_30595627T/C, which were closely spaced on 
chromosome 2. The second group contained three SNPs, 
chr5_l 29635 14T/C, chr5_13684450T/C, and 

chr5_15275826A/G, that located on chromosome 5. The third 
group, chrUn_5023005A/G and chrUn_19904498A/G were on 
different scaffolds having not yet been assigned to any of the sweet 
orange chromosomes. The 3 groups of SNPs were therefore 
regarded as three super loci, and hence the 25 Setl SNPs were 
actually treated as 21 loci in our statistical analysis (Figure 4). 

When using the Setl SNPs in CUL+KNOWN and MYAN- 
MAR, we obtained a PI value of 5.28E-08 and 4.25E-07, 
respectively, indicating the discriminating power of Setl SNPs 
was different between the two groups (Figure 4). Comparison 
between the 4 populations showed that the strongest discrimina- 
tion power of the SNP set was in P3 while the weakest was in P2. 
The PI values in P3 and P2 were different by two magnitudes as 
shown in Figure 4, which were 1.36E-08 and 1.38E-06, 
respectively. However, theoretical calculation showed that the 
Setl SNPs was still powerful enough to be used for identity 
analysis in a population of 1000 individuals (N) with similar to P2's 
allelic frequencies. The power of this set of SNPs in discriminating 



siblings was given by PIsib (Figure 4), which varied from the 
strongest (1.24E-04) in P3 to the weakest (1.35E-03) in P2, 
suggesting the set of SNPs could discriminate 40 to 1 50 siblings. 
The PI, PIsib and PIpar-oflF for each of the 101 CUL+ 
UNKNOWN genotypes were given in Table S2. For parentage 
exclusion with 200 candidate pummelo parents [53], the set of 
SNPs was shown to be not strong enough, and two to three times 
more SNP loci should be needed to be statistically powerful 
enough. 

Use of DNA sequences for pummelo cultivar 
identification 

Twelve gene segments were sequenced for 24 accessions 
(Table 2). The lengths of the sequenced gene segments were 
between 410-630 bp and totaled at 6107 bp. On these sequenced 
segments 127 reliable SNPs were identified. Haplotype profiles 
were obtained for all the 24 accessions based on these SNPs 
(Table 2). As shown in Table 2, the 24 accessions were assigned to 
23 distinct genotypes, which was consistent with the result by 
geiiotyping the Setl SNPs. Notably, Hejiangyou and Lingnan- 
shatianyou were identified as the same genotype by both methods. 
Since Boluoxiangyou, Tangyouzi, Jiaodaoyou and Iwaikan, were 
shown by phylogenetic tree to contain various numbers of 
mandarin haplotypes (Figure S3) they were treated as hybrids 
and excluded from analysis for pummelo intra-specific SNPs. 
Within the remaining sequences we identified 54 pummelo intra- 
specific SNPs (hereafter referred as Set2 SNPs) (Table 3). 

For the 20 true to type pummelo cultivars remained, the 
nucleotide diversity (n) of the 12 gene segments varied from 0.0004 
to 0.0061 and averaged at 0.0029 (Table 3). The number of 
identified haplotypes on each gene segment varied from 2 to 9 and 
averaged at 4.7 haplotypes/segment, and the haplotype diversity 
(equivalent to PIC) varied from 0.31 to 0.80 and averaged at 0.57 




Combination of increased loci 



Figure 4. Increases in -lg(PI) and -lg(Plsib) with the use of increasing number of SNPs in different populations or groups. SNPs were 
added one by one In the order from 1 to 25 (listed In Table 1) In the construction of the plot. (chr2_30594899T/C, chr2_30595627T/C), 
(chr5_1 296351 4T/C, chr5_1 3684450T/C, chr5_1 5275826A/G), and (chrUn_5023005A/G, chrUn_19904498A/G) were treated as three super loci (the 6'^ 
12* and 21^' dots (large)). The solid and dash lines connect Pis and PIsibs, respectively. 
doi:1 0.1 371/journal.pone.0094506.g004 
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(Table 3), which was more than twice of the average PIC value of 
the Setl SNPs. Significant LD was discovered between two linked 
gene segments (Cs9gl4320 and Cs9gl6170) mapped on pseudo- 
chromosome 9 of the reference sweet orange genome, and the two 
segments were combined as a super locus in PI calculation. PI and 
PIsib of the 12 segments were 4.9E-08 and 8.7E-04, respectively, 
suggesting that the combined use of the 12 gene segments is very 
powerful for pummelo cultivar identification but has limited power 
in distinguishing siblings. It was found that the haplotype diversity 
for each segment was significantly (p<0.01) correlated with (by 
Pearson correlation analysis, r^ = 0.53) the number of SNPs found 
on the segment (varied from 1 to 10 SNPs). 

Discussion 

Cultivar identification is a prerequisite for a more efficient 
breeding activity and a more successful cultivation for a crop. The 
problem is that it is often difficult to identify cultivars solely by 
morphological traits [54] . This is because commercial cultivars are 
more or less similar to each other in many agronomic traits that 
have been convergently selected by humans. Though many 
methods have been explored, it seems that only DNA markers 
offer a satisfactory solution to this problem since suitable number 
of DNA markers, if used collectively, have been demonstrated to 
be good enough to identify non-clonal individuals or plant 
varieties [8,9,16,17,18]. For example, a bi- allelic SNP locus can 
have three possible genotypes: 2 homozygous and 1 heterozygous, 
and N such SNP loci can have 3^^ different combinations, a 
number increases exponentially with increase in N. If N is large 
enough, the possible combinations wiU also be large enough to 
accommodate all known sexually originated cultivars. In other 
word, the combinations of a suitable number of DNA markers can 
be used as genetic barcodes to be assigned to a definite number of 
cultivars. The advantages of such a cultivar identification system 
are two-folds: 1) the barcodes can be easily recorded and shared; 2) 
the identification results are rather precise with statistical support. 
With such a system, to identify a cultivar needs only to analyze its 
genotypes at the loci used for genetic barcodes and compare the 
results to those in the database. In this study we investigated tiie 
possibility of using combinations of either SNPs or DNA segment 
sequences in the identification of pummelo cultivars that are easily 
misidentified by morphological traits, and to our satisfaction, both 
methods were shown to be powerful enough to identify our study 
samples. Therefore, the SNP genotypes listed in Table S 1 can be 
used as reference genetic barcodes for pummelos. 

High PIC and low level of LD between markers have been the 
two important factors to be considered in selecting markers for 
high efficient identification of cultivars [8,13]. PIC was largely 
dependent on allelic frequency. To increase PIC some researchers 
used only SNPs with higher than 0.1 allelic frequency [55], and 
some others set the PIC value higher than 0.2 when choosing 
markers [8]. In our case, with the use of SNPs with an allelic 
frequency of higher than 0.1, a relatively satisfactory PIC value 
was obtained. LD not only reduces the overall discrimination 
power of the markers used [8,56] but also influences the evaluation 
for discrimination power of the markers. Nevertheless markers 
physically linked could still be used in parentage and identity 
analysis since the linkage phase could be assessed and the tightiy 
linked SNPs could be used as a 'super locus' [57,58]. Such super 
locus could still have a very high PIC as shown in the present study 
and other studies [58]. However, markers in significant LD are not 
necessarily physically linked, since LD could also be caused by 
other factors such as genetic drift, bottle neck effect, selection 
within populations, and population admixture et al. [59]. In our 
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study, two of the Setl SNPs, chrUn_5023005A/G and 
chrUn_19904498A/G, were in LD but most probably were not 
physically linked. 

The transferability of the SNPs discovered from certain samples 
to other samples may be limited, which is known as ascertainment 
bias [60,61]. Ascertainment bias could influence the use of SNPs in 
cultivar identification. This is because SNPs highly polymorphic in 
one population are not necessarily similarly polymorphic in other 
populations. Anyway, the problem could be solved at the stage of 
SNP discovery. An intuitive way is to select ancient SNPs that are 
homogeneously polymorphic among the study populations as 
demonstrated in 6 Eucalyptus species by Correia et al. [15]. In a 
case of human individual identification, small F%t values (<0.06 in 
the study of Pakstis et al.) were required for SNPs in order to avoid 
ascertainment bias [10]. In this study, ascertainment bias was 
shown to influence the PI value of Setl SNPs by at most two 
magnitudes, even though, the set of the SNPs should be still usable 
in a predicted population of N= 1000 with the lowest PI value. 
Therefore, the barcode system wiU have enough room for 
accommodating future cultivars. 

Taking together, the Setl SNPs should be very usable for 
pummelo cultivar identification. And this conclusion was verified 
by the genotyping results of the 260 accessions. First, only 13 SNPs 
were needed to discriminate all the discovered genotypes (Figure 3), 
and every two genotypes were different by at least 2 SNPs when all 
the 25 SNPs were used (Figure 2), showing that this set of SNPs 
were more than enough for the discrimination of all study 
genotypes. Second, accessions known to be different were 
discriminated, and all the 55 Myanmar individuals grown from 
seedlings were assigned to 55 different genotypes. Third, the 
obtained PI values also suggested that the 25 Setl SNPs were 
powerful enough for identification of pummelo cultivars. 

The accuracy of the genotyping method influences the efficacy 
of cultivar identification [62] . In this respect, HRMA has been 
shown to be highly efficient and accurate [25,36,40,63,64,65]. In 
the study of Gundry et al. [25], the accuracy of HRMA reached 
99.7%. However, mismatches between template and primer 
reduced greatly the accuracy as shown in this study (Figure 1 
and Figure SI), and therefore, should be avoided. It was noted that 
even a 5' end template-primer inismat[:h distorted the melting 
curves severely enough to interfere with judgment about the 
genotype (Figure 1), and more severe distortion with 3' end 
mismatches should be expected. 

In due course of verifying the Setl SNP genotyping results by 
sequencing, we noted that a high haplot)'pe diversity existed in the 
sequences. This prompted us to investigate if it was possible to use 
directly the gene sequences to identify cultivars, which is similar to 
the traditional DNA barcoding technology used in identifying 
species. We set out to sequence a total of 1 2 gene segments for 24 
representative accessions. At a first glance, the total PIC value of 
the 54 Set2 SNPs, for being in 12 tighdy linked groups, must be 
lower than that of the same number of independent SNPs, but in 
fact the high SNP density also increased the haplotype diversity. 
The resulted PI value suggested that for the purpose of cultivar 
identification the 12 gene segments were already powerful enough. 
It must be pointed out that we did not intentionally select the 
target segments, yet they showed a surprisingly high discrimination 
power. Therefore, it should be possible to select even fewer high 
PIC segments to identify pummelo cultivars. Apparently, more 
DNA segment sequences are needed to identify pummelo cultivars 
compared to the DNA barcoding technology used in species 
identification that uses only 1 or a few DNA segment sequences 
[28,29,30] for the genetic differences are smaller between cultivars 
than between species. 



For fruit trees like citrus, the genetic difference may be either 
very large between cultivars derived from inter-specific or inter- 
genic hybridizations or very small between cultivars originated 
from somatic mutations [66,67]. It is easy to identity inter-specific 
or inter-genic hybrids but usually not easy to identify somatic 
mutants for it is almost impossible to locate- by chance the specific 
point mutations when using a limited number of markers. As an 
example, the three mutants of the white flesh Guanximiyou, red 
flesh/white sponge Guanximiyou, red flesh/red sponge Sanhon- 
gyou and yellow flesh/white sponge Huangjinyou, were genotyp- 
ically identical to white flesh Guanximiyou at all analyzed loci. 
Similarly, the mutant red Hassaku was identical to Hassaku. Beni 
Amanatsu and Kawano Natsudaidai were identical as expected for 
both were mutants of Natsudaidai. The grapefruits, 'Red Marsh', 
'Star Ruby' and 'Flame' were also not surprisingly identified to 
have the same genotype since almost all grapefruit lineage can 
always be traced back to "Duncan", the oldest grapefruit cultivar 
[68] . Though red flesh Shatianyou and early red flesh Shatianyou 
were identified to have the same genotype, they were not somatic 
mutations of any other SHATIANYOU member. Except somatic 
mutants, samples with identical genotypes should be regarded as 
synonyms. Examples are four SHATIANYOU cultivars, Gulao- 
qian, Lingnanshatianyou, Hejiangyou and Zhenlongyou that are 
known to be morphologicaUy inseparable. Six groups of samples 
were identified as homonyms after verification with repeated 
genotyping,however, the possibility that they have been mixed 
during germplasm collection could not be excluded. Particularly, 
the three Huazhoujuhong accessions (ID 162, 163 and 176 in 
Table SI) were in fact different and should be treated as 
homonyms, and the results verified the notion that Huazhouju- 
hongs are a group of hybrids [69] . 

The origin and dispersion of the pummelo species remain 
controversial for no wild pummelos have been unequivocally 
recognized. South China, Southeast Asia and Northeast India 
were independentiy proposed to be the origin place since 
pummelos were highly polymorphic in these areas 
[1,2,3,4,70,71]. Yunnan was suggested to be the origin center of 
citrus for being rich in not only native citrus species but also river 
systems that could facilitate the dispersion of citrus [72]. This may 
be especiaUy true regarding the origin and dispersion of pummelo 
because pummelo fruit, for having peels with very thick sponge 
tissue to facilitate floating and enhance tolerance to mechanical 
injuries, could be carried away by rivers for long distances. It can 
be envisaged that pummelos were accidentally brought to the 
downstream by rivers originating from or flowing through Yunnan 
in ancient times, and then dispersed to nearby areas by fruit-eating 
animals to develop eventually isolated populations. These isolated 
populations served as genetic pools for domestication and should 
have been gradually mixed up. In this study, an obvious 
population structure was indeed revealed by Setl SNPs. In 
addition, high haplotype diversity and high nucleotide diversity 
were observed on 12 gene segments of the 24 representative 
cultivars. Furthermore, a large number of the pummelo genotypes 
were inferred to be admixtures of different populations (Figure S2). 
It should also be mentioned that a lot of pummelos recorded in 
ancient Chinese literatures were distributed along the Yangtze 
River and the Pearl River that flow through Yunnan, and that 
pummelos were also found along riverbanks in Yunnan and 
Myanmar during our field investigations. 

Supporting Information 

Figure SI Verificatioii of the primer-template mis- 
match suspected in HRMA analysis (bottom panels in 
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Figure 1) by sequencing. A. Direct sequencing of one 
homozygote (Hoi) and the two lieterozygotes (Hel and He2) on 
SNP chr4_18094735A/G (black arrows). The forward and reverse 
HRMA primers were indicated by blue and green arrows, 
respectively. A previously unknown SNP (chr4_180947.54T/C, 
indicated by red arrows) was discovered in the re\'erse primer 
region and He2 was found to be heterozygous for this SNP. B. 
Diagrams showing how the extra SNP (chr4_18094754T/ 
C)identi5ed in panel A influenced the PGR amplification 
efficiency of He2 template. Three haplotypes (Hal, Ha2 and 
Ha3) were reconstructed for chr4_18094735A/G and 
chr4_18094754T/C from analyzing melting curves and sequenc- 
ing data. Ha3 was also identified in the sweet orange reference 
genome. The SNP induced mismatch on Ha3 (marked by a red 
cross) reduced the primer-template annealing temperature and 
thus reduced the PGR amphfication elEciency for the He2. 
(TIFF) 

Figure S2 Assignment of 156 pummelo accessions 
(CUL+UNKNOWN+MYANMAR) to four populations by 
STRUCTURE version 2.3.4. PI, P2, P3, and P4 were 
represented by blue, violet, green and red, respectively. Accession 
IDs were the same as those in Table SI. 
(TIFF) 

Figure S3 Neighbour-joining trees based on inferred 
pummelo haplotypes on 12 gene segments. Nanju was used 
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SNPs. Note: Red and blue background indicate the UNKNOWN 
group and the GUL group, respectively. 
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individual is identical to the given genotype, as is the same case for 
PIsibs and PIpar-ofi' [41]. 
PCLSX) 

Author Contributions 

Conceived and designed the experiments: GYZ BW. Performed the 
experiments: RTY YJL BW. /Analyzed the data: BW RTY. Contributed 

reagents/materials/analysis tools: GYZ JQY CL YZ BJJWZ LZ XW STY 
XJB DGZ. Wrote the paper: BW GYZ. 



16. Xic RJ, Zhou J, Wang GY, Zhang SM, Chen L, et al. (2010) Cultivar 
identification and genetic diversity of (Chinese baybcrry (Atyrica rubra) 
accessions based on fluorescent SSR markers. Plant Mol Biol Rep 29: 554— 
562. dot: 10.1007/sl 1105-010-0261-6. 

17. Moriya S, Iwanami H, Abe K (2011) A practical method for apple cultivar 
identification and parent-offspring analysis using simple sequence repeat 
markers. Euphytica 177: 135-150. doi: 10.1007/sl0681-010-0295-8. 

18. Butler JM (2006) Genetics and genomics of core STR loci used in human 
identity testing. Forensic Sci 51: 253-265. doi: 10.1 1 1 l/j.1556-4029.2006. 
00046.x. 

19. Hamblin MT, Warburton ML, Buckler ES (2007) Empirical comparison of 
simple sequence repeats and single nucleotide polymorphisms in assessment of 
maize diversity and relatedness. PLoS ONE 2(12): el367. doi: 10.1371/ 
journal.pone.OOO 1 367. 

20. Tokarska M, Marshall T, Kowalczyk R, VVojcik JM, Pertoldi C, et al. (2009) 
Ellectiveness of microsatellite and SNP markers for parentage and identity 
analysis in species with low genetic diversity: the case of European bison. 
Heredity 103: 326-332. doi: 10.1038/hdy.2009.73. 

21. Rafalski A (2002) Applications of single nucleotide polymorphisms in crop 
genetics. Curr Opin Plant Biol 5: 94-100. doi: 10.1016/81369-5266(02)00240-6. 

22. Gupta PK, Roy JK, Prasad M (2001) Single nucleotide polymorphisms: A new 
paradigm for molecular marker technology and DNA polymorphism detection 
with emphasis on their use in plants. Curr Sci India 80: 524-535. 

23. Curtu AL, Finkeldev R, (bailing () (2004) Comparative sequencing of a 
microsatellite locus reveals size homoplasy within and between European oak 
species (Ourrcus spp.). Plant Mol Biol Rep 22: 339-346. doi: 10.1007/ 
BF02772617. 

24. Jones ES, Sullivan H, Bhattramakki D, Smith JS (2007) A comparison of simple 
sequence repeat and single nucleotide polymorphism marker technologies for the 
genotypic analysis of maize (Zea mays L.). Theor Appl Genet 1 15: 361—371. doi: 
10.1007/s00122-007-0570-9. 

25. Gundr\' CN, Dobrowolski SF, Martin YR, Robbins TC, Nay LM, ct al. (2008) 
Base-pair neutral homozygotes can be discriminated by calibrated high- 
resolution melting of small amplicons. Nucleic Acids Res 36: 3401-3408. doi: 
10.1093/nar/gkn204. 

26. Tong SYC, Xic S, Richardson LJ, Ballard SA, Dakh F, et al. (2011) High- 
resolution melting genotyping of Enterococcus faecium based on multiloeus 
sequence typing derived single nucleotide polymorphisms. PLoS ONE 6: 
e29189. doi: 10.1371/joumal.pone.0029189. 

27. Montgomery J, Wittwer CT, Palais R, Zhou L (2007) Simultaneous mutation 
scanning and genotyping by high-resolution DNA melting analysis. Nat Protoc 
2: 59-66. doi: 10.1038/nprot.2007.10. 

28. Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications 
through DNA barcodes. Proc R Soc Loud B 270: 313-321. 

29. Bruni I, De Mattia F, Martellos S, Galimberti A, Savadori P, et al. (2012) DNA 
Barcoding as an Effective Tool in Improving a Digital Plant Identification 



PLOS ONE I www.plosone.org 



11 



April 2014 | Volume 9 | Issue 4 | e94506 



Identification of Pummelos by DNA Fingerprinting 



System: A CJasc Study for the Area of Mt. Valcrio, Trieste (NE Italy). PLoS 
ONE 7{9): c43256. doi:10.1371/journal.ponc.0043256. 

30. Schoch CL, Seifert KA, Huhndorf S, Robert V, SpougeJL, et al. {2012} Nuclear 
ribosomaJ internal transcribed spacer (ITS) region as a universal DNA barcode 
marker for Fungi. P Natl Acad Sci-Biol 109: 6241-6246. Available: http://www. 
pnas.org/content/ 109/ 16/6241 .short. 

31. Birky CW (1995) Uniparental inheritance of mitochondrial and chloroplast 
genes: mechanisms and evolution. P Nail jVead Sri-Bioi 92: 11331—1 1338. 

32. Barrett HC, Rhodes AM (1976) A numerical taxonomie study of affinity 
relationships in cultivated Citrus and its close relatives. Syst Bot 1: 105-136. 

33. Nicolosi E, Deng ZN, Gentile A, Malfa LS, Continella G, et al. (2000) Citrus 
phylogeny and genetic origin of important species as investigated by molecular 
markers. Theor Appl Genet 100: 1155-1166. doi: 10.1007/s001220051419. 

34. Scora RW (1975) On the history and origin of Citrus. Bulletin of the Torrey 
Botanical Club 102: 369-375. 

35. Webber HJ (1943) Cultivated varieties of citrus. In: Webber HJ, Batchelor DL, 
editors. The Citrus Industry. California: University of California Press. 1: 475- 
668. 

36. Yang RT, Wu B, Li C, Zeng P, ZengJW, et al. (2013) Comparison of allelc- 
specific PGR and high resolution melting analysis in SNP genotyping and their 
application in pummelo cultivar identification. Acta Hort Sinica 40: 1061—1070. 

37. Larkin MA, Blackshields G, Brown NP, Chenna R, McGertigan PA, et al. (2007) 
Clustal W and Clustal X version 2.0. Bioinformatics, 23: 2947-2948. 

38. Xu Q, Chen LL, Ruan XA, Chen D, Zhu A, et al. (2013) The draft genome of 
sweet orange (Citrus sinensis). Nat Genet 45: 59-66. doi: 10.1038/ng.2472. 

39. Liew M, Pryor R, Palais R, Meadows C, Erali M, et al. (2004) Genotyping of 
single-nucleotide polymorphisms by high-resolution melting of small amplicons. 
Clin ChemSO: 1156-1164. doi: 10.1373/clinchem.2004.032136. 

40. Wu B, Yang RT, Zhu SP, Zhong Y, Jiang B, et al. (2012) Genotyping single 
nucleotide polymorphisms in mandarin cultivars using high resolution melting 
analysis. Acta Hort Sinica 39: 777-782. 

41. McKelvey KS, Schwartz MK (2005) DROPOUT: a program to identify 
problem loci and samples for noninvasive genetic samples in a capture-mark- 
rccapturc framework. Mol Ecol Notes 5: 716-718. doi: 10. 1 1 1 1/j. 1471- 
8286.2005.01038.x. 

42. Peakall R, Smouse P (2012) GenAlEx 6.5: Genetic analysis in Excel. Population 
genetic soft-ware for teaching and research — an update. Bioinformatics 1 : 6—8. 
doi: 10.1093/bioinformatics/bts460. 

43. PritchardJK, Stephens M, Donnelly P (2000) Inference of population structure 
using multilocus genotype data. Genetics, 155: 945-959. 

44. Liu K, Muse SV (2005) PowerMarker: an integrated analysis environment for 
genetic marker analysis. Bioinformatics 21: 2128-2129. doi: 10.1093/bioinfor- 
matics/bti282. 

45. Taberlet P, Luikart G (1999) Non-invasive genetic sampling and individual 
identification. Biol J Linn Soc 68: 41-55. doi: 10.1 1 1 l/j!l095-8312. 1999. 
tbO 1157.x. 

46. Waits LP, Luikart G, Taberlet P (2001) Estimating the probability of identity 
among genotypes in natural populations: cautions and guidelines. Mol Ecol 10: 
249-256. doi: 10.1046/j.l365-294X.2001.01185.x. 

47. Peakall R, Ebert D, Cunningham R, Lindenmayer D (2006) Mark-reeapture by 
genetic tagging reveals restricted movements by bush rats (Rattus fliscipes) in a 
fragmented landscape. J Zool 268: 207-216. doi: 1 0. 1 1 1 1 /j. 1469- 
7998.2005.000 II. X. 

48. Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of 
DNA polymorphism data. Bioinformatics 25: 1451-1452. doi:10.1093/bioinfor- 
matics/btpl87. 

49. Haploid Clementine Genome. Available: http://www.phytozome.org/ 
Clementine. Accessed 1 March 2014, 

50. SwofTord DL (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and 
Other Methods). Version 4 [computer program]. Sunderland (Massachusetts): 
Sinauer Associates. 

51. Tajima E, Nei M (1984) Estimation of evolutionary distance between nucleotide 
sequences. Mol Biol Evol 1: 269-285. 

52. FigTree website. Available: http://http://tree.bio.ed.ac.uk/software/figtree/. 
Accessed 1 March 2014. 



53. Jainieson A, Taylor SC (1997) Comparisons of three probability formulae for 
parentage exclusion. Anim Genet 28: 397-400. doi: 1 0. 1 1 1 1 /j. 1 365- 
2052.1997.00186.x. 

54. Wang C, Han J, Zhang Y, Kayesh E, FangJ, et al. (2012) Plant variety and 
cultivar identification: advances and prospects. CRC Cr Rew Biotechn 33(2): 
111-125. doi:10.3109/07388551. 2012.675314. 

55. Werner FAO, Durstewitz G, Habermann FA, Thaller G, Kramer W, et al. 
(2004) Detection and characterization of SNPs useful for identity control and 
parentage testing in major European dairy breeds. Anim Genet 35: 44—49. 
doi:10.1046/j. 1365-2052. 2003.01071.x. 

56. Lcc HY, Park MJ, Yoo J-E, Chung U, Han G-R, ct al. (2005) Selection of 
twenty-four highly informative SNP markers for human identification and 
paternity analysis in Koreans. Forensic Sci Int 148: 107-112. doi:10.1016/ 
j.fbrsciint.2004.04.073. 

57. Jones AG, Small CM, Paczolt KA, Ratterman NL (2010) A practical guide to 
methods ofpsirentage analysis. Mol Ecol Resour 10: 6-30. doi: 10.1111/j.l755- 
0998.2009.02778.x. 

58. Jones B, Walsh D, Werner L, Fiumera A (2009) Using blocks of linked single 
nucleotide polymorphisms as highly polymorphic genetic markers for parentage 
analysis. Mol Ecol Resour 9: 487-497. doi:10.1 1 1 l/j.l755-0998.2008.02444.x. 

59. Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, et al. 

(2001) Structure of linkage disequilibrium and phenot\'pie associations in the 
maize genome. P Nad Aead Sei-Biol 98: 1 1479-1 1484. 

60. Clark AG, Hubisz MJ, Bustamante CD, WUliamson SH, Nielsen R (2005) 
Ascertainment bias in studies of human genome-wide polymorphism. Genome 
Res 15: 1496-1502. doi: 10. 1 101/gr.4107905. 

61. LachanceJ, I'ishkoff SA (2013) SNP ascertainment bias in population genetic 
analyses: Why it is important, and how to correct it. Bioessays, 35: 780—786. doi: 
10.1002/bies.201300014. 

62. Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: 
causes, consequences and solutions. Nat Rev Genet 6: 847-859. doi:10.1038/ 
nrgl707. 

63. Smith BL, Lu (.^P, Bremer JRA (2010) High-resolution melting analysis 
(HRMA): a highly sensitive inexpensive gcnot'yping alternative for population 
studies. Mol Ecol Resour 10: 193-196. doi: 10.1 1 1 1/j. 1755-0998.2009.02726.X. 

64. Distefano G, Caruso M, La Malfa S, Gentile A, Wu SB (2012) High resolution 
melting analysis is a more sensitive and effective alternative to gel-based 
platforms in analysis of SSR - An Example in Citrus. PLoS ONE 7(8): e44202. 
doi:10.1371/journal.pone.0044202. 

65. Distefano G, La Malfa S, Gentile A, Wu SB (2013) EST-SNP genotyping of 
citrus species using high-resolution melting curve analysis. Tree Genet Genomes 
9: 1271-1281. doi: 10.1007/sll295-013-0636-6. 

66. Brcto MP, Ruiz C, Pina J a, Asins MJ (2001) The diversification of Citrus 
Clementina Hort. ex Tan., a vegetatively propagated crop species. Mol 
Phylogenct Evol 21: 285-293. doi: 10.1006/mpev.2001.1008. 

67. Corazza-Nunes MJ, Machado MA, Nunes WMC, Cristofani M, Targon MLPN 

(2002) Assessment of genetic variability in grapefruits (Citrus paradisi Maef) and 
pummelos (C. maxima (Burm.) Merr.) using RAPD and SSR markers. 
Euphytica 126: 169-176. doi: 10.1023/A:1016332030738. 

68. Hodgson RW (1967) Horticultural varieties of citrus. In: Reuther W, Webber 
HJ, Batchelor LD, editors. The Citrus Industry. California: University of 
California Press, Berkeley, 1: 431-591. 

69. Tan WQ, Zheng JL, Jian JQ, DiaoJH, li SN, et al. (2011) Genetic diversity 
analysis of Huazhoujuhong germplasm based on SSR moleculsir markers. Fujian 
Fruits 1: 7-10. 

70. CandoUe AD (1886) Origin of cultivated plants (English translation), 2nd ed. 
New York: Hafncr Publishing. 

71. Tanaka T (1961) Historical and geographical background to the course of 
development of present citrus industry. In: (.^itrologia. Osaka: Citrologia 
Supporting Foundation, pp 1—6. 

72. Gmitter FG, Hu XL (1990) The possible role of Yunnan, China, in the origin of 
contemporary Citrus species (Rutaceae). Econ Bot 44(2): 267-277. doi: 10.1007/ 
BF02860491. 



PLOS ONE I www.plosone.org 



12 



April 2014 | Volume 9 | Issue 4 | e94506 



